Document Detection Data Preparation

نویسنده

  • Donna K. Harman
چکیده

The document collection needed to reflect the corpus imagined to be seen by analysts. This meant that a very large collection was needed to test the scaling of the algorithms, including documents from many different domains to test the domain independence of the algorithms. Additionally the documents selected needed to mirror the different types of documents used in the TIPSTER application. Specifically they had to have a varied length, a varied writing style, a varied level of editing and a varied vocabulary. As a final requirement, the documents had to cover different fimeframes to show the effects of document date on the routing task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Document Image Dewarping Based on Text Line Detection and Surface Modeling (RESEARCH NOTE)

Document images produced by scanner or digital camera, usually suffer from geometric and photometric distortions. Both of them deteriorate the performance of OCR systems. In this paper, we present a novel method to compensate for undesirable geometric distortions aiming to improve OCR results. Our methodology is based on finding text lines by dynamic local connectivity map and then applying a l...

متن کامل

Plagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting

With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...

متن کامل

An Efficient Partition Technique to reduce the Attack Detection Time with Web based Text and PDF files

In this paper we propose an efficient partition technique for web based files (jsp, html, php), text (word, text files) and PDF files. We are working in the direction of attack time detection. For this motivation we are considering mainly two factors first in the direction of minimizing the time, second in the direction of file support. For minimizing the time we use partitioning method. We als...

متن کامل

Object Motion Detection in Video Frames Using Background Frame Matching

In this project we present detection the motion in video frames using background frame Matching. These document video surveillance systems have become widely available to ensure safety and security in both the public and private sectors due to incidents of terrorist activity and other social problems. This paper proposes a novel motion detection method with a background model module and an obje...

متن کامل

Examining the Ethical Foundations of Compensation for Mistakes and Forgeries in the Preparation of Official Documents

Background: Preparing a formal transaction document is one of the specific duties of notaries public, which requires the use and observance of various substantive and formal conditions. Failure to comply with any of these conditions can lead to the annulment of the document by the court and the responsibility to compensate the clerks. Compensation by the clerks in various articles such as Artic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993